A Systematic Review of Stemmers of Indian and Non-Indian Vernacular Languages

نویسندگان

چکیده

The stemming process is crucial and significant in the pre-processing step of natural language processing. stemmer oversees process. It facilitates extraction morphological variants a root or base word from provided word. Over period, several stemmers for various vernacular languages have been proposed. However, very few research studies comprehensively investigated these available stemmers. This paper makes multifold contributions. First, we discuss 15 Indian 17 non-Indian describing their key points, benefits, drawbacks. All which built are covered this study. For languages, commonly spoken covered. Second, present language-wise comparative analysis based on our identified parameters. Third, wordnets dictionaries different languages. Fourth, provide details datasets Fifth, also challenges existing future directions researchers. study presented reveals that has carried out influential such as English, Arabic, Urdu. On other hand, with limited resources, Farsi, Polish, Odia, Amharic, others, received least attention research. Moreover, rigorous most suffer over-stemming errors. With complete catalogue stemmers, aims to assist researchers professionals working areas information retrieval, semantic annotation, meaning disambiguation, ontology learning.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Literature Review: Stemming Algorithms for Indian and Non-Indian Languages

I. Introduction Stemming plays an important role in Information Retrieval System (IRS) for improving the performance of all languages. The goal of stemming is to diminish inflectional and derivational variant forms of a word to a common base form. A stemmer can execute operation of transforming morphologically identical words to root word without performing morphological analysis of that term. ...

متن کامل

Overview of Stemming Algorithms for Indian and Non-Indian Languages

Stemming is a pre-processing step in Text Mining applications as well as a very common requirement of Natural Language processing functions. Stemming is the process for reducing inflected words to their stem. The main purpose of stemming is to reduce different grammatical forms / word forms of a word like its noun, adjective, verb, adverb etc. to its root form. Stemming is widely uses in Inform...

متن کامل

A Survey of Common Stemming Techniques and Existing Stemmers for Indian Languages

Stemming is an operation that relates morphological variants of a word. The purpose of stemming is to obtain the stem or radix of those words which are not found in dictionary. If stemmed word is present in dictionary, then that is a genuine word, otherwise it may be proper name or some invalid word. Stemming is the process for reducing inflected or sometimes derived words to their stem, base o...

متن کامل

A Survey on text categorization of Indian and non-Indian languages using supervised learning techniques

Categorization of text plays an important role in the text mining field. Text categorization is the process in which documents are categorized into its predefined category. Automatic text categorization is an important task due to large amount of electronic documents. This paper presents a survey of Text categorization of Indian and non-Indian languages. There is very less work done in text cat...

متن کامل

A Comprehensive Analyze of Stemming Algorithms for Indian and Non-indian Languages

Stemming is a technique used for reducing inflected words to their stem or root form. This is applicable for both the suffix as well as prefix. Stemming is a preprocessing step in text mining application and commonly used for Natural Language Processing (NLP). A stemmer can execute operation of altering morphologically identical words to root word without performing morphological analysis of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3604612